Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R pacakges. If they have yet to be installed, we will install the R packages and load them onto R environment.
packages = c('tidyverse','ggdist','gghalves')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
The code chunk below imports exam_data.csv into R environment using read_csv() function of readr package.
exam_data <- read_csv("./data/Exam_data.csv")
The code chunk on the right add the aesthetic element into the plot. Notice that ggplot includes the x-axis and the axis’s label.
ggplot(data=exam_data, aes(x= MATHS))
The code chunk below plots a bar chart.
ggplot(data=exam_data,
aes(x=RACE)) +
geom_bar()
In a dot plot, the width of a dot corresponds to the bin width (or maximum width, depending on the binning algorithm), and dots are stacked, with each dot representing one observation.
ggplot(data=exam_data,
aes(x = MATHS)) +
geom_dotplot(dotsize = 0.5)
The code chunk below performs the following two steps:
ggplot(data=exam_data, aes(x = MATHS)) +
geom_dotplot(binwidth=2.5, dotsize = 0.5) +
scale_y_continuous(NULL, breaks = NULL)
In the code chunk below, geom_histogram() is used to a simple histogram by using values in MATHS field of exam_data.
ggplot(data=exam_data,
aes(x = MATHS)) +
geom_histogram()
In the code chunk below,
ggplot(data=exam_data,
aes(x= MATHS)) +
geom_histogram(bins=20,
color="black",
fill="light blue")
ggplot(data=exam_data,
aes(x= MATHS,
fill = GENDER)) +
geom_histogram(bins=20,
color="grey30")
geom-density() computes and plots kernel density estimate, which is a smoothed version of the histogram.
It is a useful alternative to the histogram for continuous data that comes from an underlying smooth distribution.
The code below plots the distribution of Maths scores in a kernel density estimate plot.
ggplot(data=exam_data,
aes(x = MATHS)) +
geom_density()
The code chunk below plots two kernel density lines by using colour or fill arguments of aes()
ggplot(data=exam_data,
aes(x = MATHS, colour = GENDER)) +
geom_density()
The code chunk below plots boxplots by using geom_boxplot().
ggplot(data=exam_data,
aes(y = MATHS,
x= GENDER)) +
geom_boxplot()
Notches are used in box plots to help visually assess whether the medians of distributions differ. If the notches do not overlap, this is evidence that the medians are different. The code chunk below plots the distribution of Maths scores by gender in notched plot instead of boxplot.
ggplot(data=exam_data,
aes(y = MATHS, x= GENDER)) +
geom_boxplot(notch=TRUE)
The code chunk below plots the data points on the boxplots by using both geom_boxplot() and geom_point().
ggplot(data=exam_data,
aes(y = MATHS,
x= GENDER)) +
geom_boxplot() +
geom_point(position="jitter",
size = 0.5)
Violin plots are a way of comparing multiple data distributions. With ordinary density curves, it is difficult to compare more than just a few distributions because the lines visually interfere with each other. With a violin plot, it’s easier to compare several distributions since they’re placed side by side. The code below plot the distribution of Maths score by gender in violin plot.
ggplot(data=exam_data,
aes(y = MATHS,
x= GENDER)) +
geom_violin()
The code chunk below combined a violin plot and a boxplot to show the distribution of Maths scores by gender.
ggplot(data=exam_data,
aes(y = MATHS,
x= GENDER)) +
geom_violin(fill="light blue") +
geom_boxplot(alpha=0.5)
The code chunk below plots a scatterplot showing the Maths and English grades of pupils by using geom_point().
ggplot(data=exam_data,
aes(x= MATHS,
y=ENGLISH)) +
geom_point()
ggplot(data=exam_data,
aes(y = MATHS, x= GENDER)) +
geom_boxplot() +
stat_summary(geom = "point",
fun.y="mean",
colour ="red",
size=4)
The code chunk below adding mean values by using geom_() function and overriding the default stat.
ggplot(data=exam_data,
aes(y = MATHS, x= GENDER)) +
geom_boxplot() +
geom_point(stat="summary",
fun.y="mean",
colour ="red",
size=4)
In the code chunk below, geom_smooth() is used to plot a best fit curve on the scatterplot. - The default method used is loess.
ggplot(data=exam_data,
aes(x= MATHS, y=ENGLISH)) +
geom_point() +
geom_smooth(size=0.5)
The default smoothing method can be overridden as shown below.
ggplot(data=exam_data,
aes(x= MATHS, y=ENGLISH)) +
geom_point() +
geom_smooth(method=lm, size=0.5)
The code chunk below plots a trellis plot using facet-wrap().
ggplot(data=exam_data,
aes(x= MATHS)) +
geom_histogram(bins=20) +
facet_wrap(~ CLASS)
The code chunk below plots a trellis plot using facet_grid().
ggplot(data=exam_data,
aes(x= MATHS)) +
geom_histogram(bins=20) +
facet_grid(~ CLASS)
Plot a trellis boxplot looks similar to the figure below.
ggplot(data=exam_data,
aes(y = MATHS, x= CLASS)) +
geom_boxplot() +
facet_grid(~ GENDER)
Plot a trellis boxplot looks similar to the figure below.
ggplot(data=exam_data,
aes(y = MATHS, x= CLASS)) +
geom_boxplot() +
facet_grid(GENDER ~.)
Plot a trellis boxplot looks similar to the figure below.
ggplot(data=exam_data,
aes(y = MATHS, x= GENDER)) +
geom_boxplot() +
facet_grid(GENDER ~ CLASS)
By the default, the bar chart of ggplot2 is in vertical form. The code chunk below flips the horizontal bar chart into vertical bar chart by using coord_flip().
ggplot(data=exam_data,
aes(x=RACE)) +
geom_bar() +
coord_flip()
The code chunk below fixed both the y-axis and x-axis range from 0-100.
ggplot(data=exam_data,
aes(x= MATHS, y=ENGLISH)) +
geom_point() +
geom_smooth(method=lm, size=0.5) +
coord_cartesian(xlim=c(0,100),
ylim=c(0,100))
The code chunk below plot a horizontal bar chart using theme_gray().
ggplot(data=exam_data,
aes(x=RACE)) +
geom_bar() +
coord_flip() +
theme_gray()
A horizontal bar chart plotted using theme_classic().
ggplot(data=exam_data,
aes(x=RACE)) +
geom_bar() +
coord_flip() +
theme_classic()
A horizontal bar chart plotted using theme_minimal().
ggplot(data=exam_data,
aes(x=RACE)) +
geom_bar() +
coord_flip() +
theme_minimal()
Plot a horizontal bar chart looks similar to the figure below. - Changing the colors of plot panel background of theme_minimal to lightblue and the color of grid lines to white.
ggplot(data=exam_data, aes(x=RACE)) +
geom_bar() +
coord_flip() +
theme_minimal() +
theme(panel.background = element_rect(fill = "lightblue",
colour = "lightblue",
size = 0.5,
linetype = "solid"),
panel.grid.major = element_line(size = 0.5,
linetype = 'solid',
colour = "white"),
panel.grid.minor = element_line(size = 0.25,
linetype = 'solid',
colour = "white"))
A simple vertical bar chart for frequency analysis. Critics:
image
The code chunk.
ggplot(data=exam_data,
aes(x=reorder(RACE,RACE, function(x)-length(x))))+
geom_bar() +
ylim(0,220) +
geom_text(stat="count",
aes(label=paste0(..count.., ", ",
round(..count../sum(..count..)*100, 1), "%")),
vjust=-1) +
xlab("Race") +
ylab("No. of\nPupils") +
theme(axis.title.y=element_text(angle = 0))
This code chunk uses fct_infreq() of forcats package.
exam_data %>%
mutate(RACE = fct_infreq(RACE)) %>%
ggplot(aes(x = RACE)) +
geom_bar()+ ylim(0,220) +
geom_text(stat="count",
aes(label=paste0(..count.., ", ",
round(..count../sum(..count..)*100, 1), "%")),
vjust=-1) +
xlab("Race") +
ylab("No. of\nPupils") +
theme(axis.title.y=element_text(angle = 0))
image
The code chunk
ggplot(data=exam_data,
aes(x= MATHS)) +
geom_histogram(bins=20,
color="black",
fill="light blue") +
geom_vline(aes(xintercept=mean(MATHS, na.rm=T)),
color="red",
linetype="dashed",
size=1) +
geom_vline(aes(xintercept=median(MATHS, na.rm=T)),
color="grey30",
linetype="dashed",
size=1)
The histograms are elegantly designed but not informative. This is because they only reveal the distribution of English scores by gender but without context such as all pupils.
image
The makeover histograms are not only elegantly designed but also informative. This is because they reveal the distribution of English scores by gender with reference to all pupils.
The code chunk below is used to create the makeover design on the right. Note that the second line is used to create the so called Background Data - full without the 3th column (GENDER).
d <- exam_data
d_bg <- d[, -3]
ggplot(d, aes(x = ENGLISH,
fill = GENDER)) +
geom_histogram(data = d_bg,
fill = "grey",
alpha = .5) +
geom_histogram(colour = "black") +
facet_wrap(~ GENDER) +
guides(fill = FALSE) +
theme_bw()
image
The code chunk used to create the makeover.
ggplot(data=exam_data,
aes(x=MATHS, y=ENGLISH)) +
geom_point() +
coord_cartesian(xlim=c(0,100),
ylim=c(0,100)) +
geom_hline(yintercept=50,
linetype="dashed",
color="grey60",
size=1) +
geom_vline(xintercept=50,
linetype="dashed",
color="grey60",
size=1)
devtools::install_github("psyteachr/introdataviz")
ggplot(exam_data,
aes(x = RACE,
y = MATHS,
fill = GENDER)) +
introdataviz::geom_split_violin(alpha = .4,
trim = FALSE) +
geom_boxplot(width = .2,
alpha = .6,
fatten = NULL,
show.legend = FALSE) +
stat_summary(fun.data = "mean_se",
geom = "pointrange",
show.legend = F,
position = position_dodge(.175)) +
scale_y_continuous(breaks = seq(0, 100, 20),
limits = c(0, 100)) +
scale_fill_brewer(palette = "Dark2",
name = "Language group")
This hands-on exercise introduces ggdist package. You will learn how to create raincloud plots as shown on Slide 23 of Lesson 1. - First, stat_halfeye() of ggdist package is used to create a half violin plot on the right of the vertical axis.
ggplot(exam_data,
aes(x = RACE,
y = MATHS)) +
scale_y_continuous(breaks = seq(0, 100, 20),
limits = c(0, 100)) +
stat_halfeye(adjust = .33,
width = .67,
color = NA,
justification = -0.01,
position = position_nudge( x = .15) )
ggplot(exam_data,
aes(x = RACE,
y = MATHS)) +
scale_y_continuous(breaks = seq(0, 100, 20),
limits = c(0, 100)) +
stat_halfeye(adjust = .33,
width = .67,
color = NA,
justification = -.01,
position = position_nudge( x = .15) ) +
stat_dots(side = "left",
justification = 1.1,
binwidth = .25,
dotsize = 5)
ggplot(exam_data,
aes(x = RACE, y = MATHS)) +
scale_y_continuous(breaks = seq(0, 100, 20),
limits = c(0, 100)) +
stat_halfeye(adjust = .33,
width = .67,
color = NA,
justification = -.01,
position = position_nudge( x = .15) ) +
stat_dots(side = "left",
justification = 1.1,
binwidth = .25,
dotsize = 5) +
coord_flip()
In this alternative design, boxplots are added by using geom_boxplot() of ggplot2.
ggplot(exam_data,
aes(x = RACE,
y = MATHS)) +
scale_y_continuous(breaks = seq(0, 100, 20),
limits = c(0, 100)) +
stat_halfeye(adjust = .33,
width = .67,
color = NA,
justification = -.01,
position = position_nudge( x = .15) ) +
geom_boxplot( width = .25,
outlier.shape = NA ) +
stat_dots(side = "left",
justification = 1.2,
binwidth = .25,
dotsize = 5) +
coord_flip()
This hands-on exercise introduces ggridge, an ggplot2 extension specially designed to create ridge plot. ggridges package provides two main geoms, namely:
geom_ridgeline and geom_density_ridges. The former takes height values directly to draw ridgelines, and the latter first estimates data densities and then draws those using ridgelines.
The code chunk below uses geom_density_ridges() to create a basic ridge density plot.
library(ggridges)
ggplot(exam_data,
aes(x = MATHS,
y = CLASS)) +
geom_density_ridges()
library(ggridges)
ggplot(exam_data,
aes(x = MATHS,
y = CLASS)) +
geom_density_ridges(rel_min_height = 0.01)
library(ggridges)
ggplot(exam_data,
aes(x = MATHS,
y = CLASS)) +
geom_density_ridges(rel_min_height = 0.01)
library(ggridges)
ggplot(exam_data,
aes(x = MATHS,
y = CLASS)) +
geom_density_ridges(rel_min_height = 0.01,
scale = 1)
In the code chunk below, stat_density_ridges() is used to create probability ridge plot.
ggplot(exam_data,
aes(x = MATHS, y = CLASS,
fill = 0.5 - abs(0.5 - stat(ecdf)))) +
stat_density_ridges(
geom = "density_ridges_gradient",
calc_ecdf = TRUE,
rel_min_height = 0.001) +
scale_fill_viridis_c(name = "Tail probability",
direction = -1)